-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support phi 3.5 #1800
Support phi 3.5 #1800
Conversation
After converting to the
I also received the same exact message when first converting the model into
The latter is really weird because the model card says that it's originally in bfloat16... |
See this if it help: |
I quantized phi3.5 to int8_bfloat16 and didn't have any error like you mentioned above in inference time. Please provide more in detail how to reproduce this and which model that you used. |
Sure... The script I used to convert it is located here: And the script I used to run it is as follows:
|
It seems like you are running this script on a GPU < 8.x while it does not support compute type bfloat16. Remove this line
|
Are you referring to the cuda compute level? The GPU I'm running it on is an rtx 4090 so it supports compute more than 8... |
Do you want me to open a separate issue still? Seems redundant> |
No description provided.